Simplifying Regular Expressions
نویسندگان
چکیده
We consider the efficient simplification of regular expressions and suggest a quantitative comparison of heuristics for simplifying regular expressions. To this end, we propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. This allows us to determine an exact bound for the relation between the two prevalent measures for regular expression size: alphabetic width and reverse polish notation length. In addition, we show that every regular expression of alphabetic width n can be converted into a nondeterministic finite automaton with ε-transitions of size at most 4 2 5 n+1, and prove this bound to be optimal. This answers a question posed by Ilie and Yu, who had obtained lower and upper bounds of 4n − 1 and 9n − 1 2 , respectively [15]. For reverse polish notation length as input size measure, an optimal bound was recently determined by Gulan and Fernau [14]. We prove that, under mild restrictions, their construction is also optimal when taking alphabetic width as input size measure.
منابع مشابه
Simplifying Regular Expressions: A Quantitative Perspective
In this work, we consider the efficient simplification of regular expressions. We suggest a quantitative comparison of heuristics for simplifying regular expressions. We propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. We apply this normal form to determine an exact bound for the relation between the two most c...
متن کاملSimplifying Text Processing with Grammatically Aware Regular Expressions
In our paper we introduce Grammatically Aware Regular expression (GARE) and describe its usage using examples from moral consequences retrieval task. GARE is an extension to the regular expression concept that overcomes many of the difficulties with traditional regexp by adding Normalization (e.g., searching all grammatical forms with basic form of a verb or adjective is possible) or POS awaren...
متن کاملBit-coded Regular Expression Parsing
Regular expression parsing is the problem of producing a parse tree of a string for a given regular expression. We show that a compact bit representation of a parse tree can be produced efficiently, in time linear in the product of input string size and regular expression size, by simplifying the DFA-based parsing algorithm due to Dubé and Feeley to emit the bits of the bit representation witho...
متن کاملThe Theory of Classification, Part 17: Multiple Inheritance and the Resolution of Inheritance Conflicts
This is the seventeenth article in a regular series on object-oriented theory for nonspecialists. Using a second-order λ-calculus model, we have previously modelled the notion of inheritance as a short-hand mechanism for defining subclasses by extending superclass definitions. Initially, we considered the inheritance of type [1] and implementation [2] separately, but later combined both of thes...
متن کاملKees Van Deemter and Emiel Krahmer Graphs and Booleans: on the Generation of Referring Expressions
Generation of Referring Expressions (gre) is a key task of Natural Language Generation nlg systems (e.g., Reiter and Dale, 2000, section 5.4). The task of a gre algorithm is to find combinations of properties that allow the generator to refer uniquely to an object or set of objects, called the target of the algorithm. Older gre algorithms tend to be based on a number of strongly simplifying ass...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010